{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Iterative Linear Quadratic Regulator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook Setup \n",
    "The following cell will install Drake, checkout the underactuated repository, and set up the path (only if necessary).\n",
    "- On Google's Colaboratory, this **will take approximately two minutes** on the first time it runs (to provision the machine), but should only need to reinstall once every 12 hours.  Colab will ask you to \"Reset all runtimes\"; say no to save yourself the reinstall.\n",
    "- On Binder, the machines should already be provisioned by the time you can run this; it should return (almost) instantly.\n",
    "\n",
    "More details are available [here](http://underactuated.mit.edu/drake.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    import pydrake\n",
    "    import underactuated\n",
    "except ImportError:\n",
    "    !curl -s https://raw.githubusercontent.com/RussTedrake/underactuated/master/scripts/setup/jupyter_setup.py > jupyter_setup.py\n",
    "    from jupyter_setup import setup_underactuated\n",
    "    setup_underactuated()\n",
    "\n",
    "# Setup matplotlib backend (to notebook, if possible, or inline).  \n",
    "from underactuated.jupyter import setup_matplotlib_backend\n",
    "plt_is_interactive = setup_matplotlib_backend()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# python libraries\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib as mpl\n",
    "\n",
    "# pydrake imports\n",
    "from pydrake.all import (Variable, SymbolicVectorSystem, DiagramBuilder,\n",
    "                         LogOutput, Simulator, ConstantVectorSource,\n",
    "                         MathematicalProgram, Solve, SnoptSolver, PiecewisePolynomial)\n",
    "import pydrake.symbolic as sym"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Iterative Linear Quadratic Regulator Derivation\n",
    "\n",
    "In this exercise we will derive the iterative Linear Quadratic Regulator (iLQR) solving the following optimization problem.\n",
    "\n",
    "\\begin{align*} \\min_{\\mathbf{u}[\\cdot]} \\quad & \\ell_f(\\mathbf{x}[N]) +\n",
    "      \\sum_{n=0}^{N-1} \\ell(\\mathbf{x}[n],\\mathbf{u}[n]) \\\\ \\text{subject to} \\quad & \\mathbf{x}[n+1] = {\\bf\n",
    "      f}(\\mathbf{x}[n], \\mathbf{u}[n]), \\quad \\forall n\\in[0, N-1] \\\\ & \\mathbf{x}[0] = \\mathbf{x}_0\n",
    "\\end{align*}\n",
    "\n",
    "After completing this exercise you will be able to write your own MPC solver from scratch without any proprietary or third-party software (with the exception of auto-differentiation). You will derive all necessary equations yourself.\n",
    "While the iLQR algorithm will be capable of solving general model predictive control problems in the form described above, we will apply it to the control of a vehicle. \n",
    "\n",
    "### Vehicle Control Problem\n",
    "Before we start the actual derivation of iLQR we will take a look at the vehicle dynamics and cost functions. The vehicle has the following continuous time dynamics and is controlled by longitudinal acceleration and steering velocity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_x = 5\n",
    "n_u = 2\n",
    "def car_continuous_dynamics(x, u):\n",
    "    # x = [x position, y position, heading, speed, steering angle] \n",
    "    # u = [acceleration, steering velocity]\n",
    "    m = sym if x.dtype == object else np # Check type for autodiff\n",
    "    heading = x[2]\n",
    "    v = x[3]\n",
    "    steer = x[4]\n",
    "    x_d = np.array([\n",
    "        v*m.cos(heading),\n",
    "        v*m.sin(heading),\n",
    "        v*m.tan(steer),\n",
    "        u[0],\n",
    "        u[1]        \n",
    "    ])\n",
    "    return x_d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that while the vehicle dynamics are in continuous time, our problem formulation is in discrete time. Define the general discrete time dynamics $\\bf f$ with a simple [Euler integrator](https://en.wikipedia.org/wiki/Euler_method) in the next cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def discrete_dynamics(x, u):\n",
    "    dt = 0.1\n",
    "    # TODO: Fill in the Euler integrator below and return the next state\n",
    "    x_next = x\n",
    "    return x_next"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given an initial state $\\mathbf{x}_0$ and a guess of a control trajectory $\\mathbf{u}[0:N-1]$ we roll out the state trajectory $x[0:N]$ until the time horizon $N$. Please complete the rollout function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rollout(x0, u_trj):\n",
    "    x_trj = np.zeros((u_trj.shape[0]+1, x0.shape[0]))\n",
    "    # TODO: Define the rollout here and return the state trajectory x_trj: [N, number of states]\n",
    "    return x_trj\n",
    "\n",
    "# Debug your implementation with this example code\n",
    "N = 10\n",
    "x0 = np.array([1, 0, 0, 1, 0])\n",
    "u_trj = np.zeros((N-1, n_u))\n",
    "x_trj = rollout(x0, u_trj)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define the stage cost function $\\ell$ and final cost function $\\ell_f$. The goal of these cost functions is to drive the vehicle along a circle with radius $r$ around the origin with a desired speed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r = 2.0\n",
    "v_target = 2.0\n",
    "eps = 1e-6 # The derivative of sqrt(x) at x=0 is undefined. Avoid by subtle smoothing\n",
    "def cost_stage(x, u):\n",
    "    m = sym if x.dtype == object else np # Check type for autodiff\n",
    "    c_circle = (m.sqrt(x[0]**2 + x[1]**2 + eps) - r)**2\n",
    "    c_speed = (x[3]-v_target)**2\n",
    "    c_control= (u[0]**2 + u[1]**2)*0.1\n",
    "    return c_circle + c_speed + c_control\n",
    "\n",
    "def cost_final(x):\n",
    "    m = sym if x.dtype == object else np # Check type for autodiff\n",
    "    c_circle = (m.sqrt(x[0]**2 + x[1]**2 + eps) - r)**2\n",
    "    c_speed = (x[3]-v_target)**2\n",
    "    return c_circle + c_speed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your next task is to write the total cost function of the state and control trajectory. This is simply the sum of all stages over the control horizon and the objective from general problem formulation above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cost_trj(x_trj, u_trj):\n",
    "    total = 0.0\n",
    "    # TODO: Sum up all costs\n",
    "    return total\n",
    "    \n",
    "# Debug your code\n",
    "cost_trj(x_trj, u_trj)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bellman Recursion\n",
    "\n",
    "Now that we are warmed up, let's derive the actual algorithm. We start with the Bellman equation known from lecture defining optimality in a recursively backwards in time.\n",
    "\\begin{align*} V(\\mathbf{x}[n]) = & \\min_{\\mathbf{u}[n]} \\quad \\ell(\\mathbf{x}[n], \\mathbf{u}[n])  + V(\\mathbf{x}[n+1]) \\\\\n",
    "\\end{align*}\n",
    "\n",
    "You may have noticed that we neglected a couple of constraints of the original problem formulation. The fully equivalent formulation is \n",
    "\\begin{align*} \\min_{\\mathbf{u}[n]} \\quad & Q(\\mathbf{x}[n], \\mathbf{u}[n]), \\quad \\forall n\\in[0, N-1]\n",
    "\\\\ \\text{subject to} \\quad \n",
    "& Q(\\mathbf{x}[n], \\mathbf{u}[n]) = \\ell(\\mathbf{x}[n], \\mathbf{u}[n])  + V(\\mathbf{x}[n+1]) \\\\\n",
    "& V(\\mathbf{x}[N]) =   \\ell_f(\\mathbf{x}[N]) \\\\\n",
    "& \\mathbf{x}[n+1] = {\\bf      f}(\\mathbf{x}[n], \\mathbf{u}[n]), \\quad \\\\ \n",
    "& \\mathbf{x}[0] = \\mathbf{x}_0.\n",
    "\\end{align*}\n",
    "The definition of a Q-function will become handy during the derivation of the algorithm.\n",
    "\n",
    "The key idea of iLQR is simple: Approximate the dynamics linearly and the costs quadratically around a nominal trajectory. We will expand all terms of the Q-function accordingly and optimize the resulting quadratic equation for an optimal linear control law in closed form. We will see that by applying the Bellman equation recursively backwards in time, the value function remains a quadratic.\n",
    "The linear and quadratic approximations are computed around the nominal state $\\bf \\bar{x} = \\bf x - \\delta \\bf x$ and the nominal control $\\bf \\bar{u} = \\bf u - \\delta \\bf u$. After applying the Bellman equation backwards in time from time $N$ to $0$ (the backward pass), we will update the nominal controls $\\bf \\bar{u}$ and states $\\bf \\bar{x}$ by applying the computed linear feedback law from the backward pass and rolling out the dynamics from the initial state $\\bf x_0$ to the final horizon $N$. Iterating between backwards and forwards pass optimizes the control problem.\n",
    "\n",
    "### Q-function Expansion\n",
    "\n",
    "Let's start by expanding all terms in the Q-function of the Bellman equation. The quadaratic cost function is\n",
    "\\begin{align*} \n",
    "\\ell(\\mathbf{x}[n], \\mathbf{u}[n]) \\approx \\ell_n \n",
    "+ \\begin{bmatrix}\\ell_{\\mathbf{x},n} \\\\  \\ell_{\\mathbf{u},n} \\end{bmatrix} ^T  \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix}\n",
    "+ \\frac{1}{2}\\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix} ^T \n",
    "\\begin{bmatrix}\\ell_{\\mathbf{xx},n} &  \\ell_{\\mathbf{ux},n}^T\\\\  \n",
    "\\ell_{\\mathbf{ux},n} & \\ell_{\\mathbf{uu},n}\\end{bmatrix}\n",
    " \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix},\n",
    "\\end{align*}\n",
    "and the dynamics function is\n",
    "\\begin{align*} x[n+1]=\n",
    "\\mathbf{f}(\\mathbf{x}[n], \\mathbf{u}[n]) \\approx \\mathbf{f}_n\n",
    "+ \\begin{bmatrix}\\mathbf{f}_{\\mathbf{x},n} & \\mathbf{f}_{\\mathbf{u},n} \\end{bmatrix}  \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix}.\n",
    "\\end{align*}\n",
    "Here, $\\ell = \\ell(\\bar{\\mathbf{x}}, \\bar{\\mathbf{u}})$ and $\\mathbf{f} = \\mathbf{f}(\\bar{\\mathbf{x}}, \\bar{\\mathbf{u}})$. $\\ell_\\mathbf{x}, \\ell_\\mathbf{u}, \\mathbf{f}_\\mathbf{x}, \\mathbf{f}_\\mathbf{u}$ are the gradients and Jacobians evaluated at $\\bar{\\mathbf{x}}$ and $\\bar{\\mathbf{u}}$. $\\ell_\\mathbf{xx}, \\ell_\\mathbf{ux}, \\ell_\\mathbf{uu}$ are the Hessians at $\\bar{\\mathbf{x}}$ and $\\bar{\\mathbf{u}}$. The expansion of the final cost follows analogously.\n",
    "The code to evaluate all the derivative terms is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class derivatives():\n",
    "    def __init__(self, discrete_dynamics, cost_stage, cost_final, n_x, n_u):\n",
    "        self.x_sym = np.array([sym.Variable(\"x_{}\".format(i)) for i in range(n_x)])\n",
    "        self.u_sym = np.array([sym.Variable(\"u_{}\".format(i)) for i in range(n_u)])\n",
    "        x = self.x_sym\n",
    "        u = self.u_sym\n",
    "        \n",
    "        l = cost_stage(x, u)\n",
    "        self.l_x = sym.Jacobian([l], x).ravel()\n",
    "        self.l_u = sym.Jacobian([l], u).ravel()\n",
    "        self.l_xx = sym.Jacobian(self.l_x, x)\n",
    "        self.l_ux = sym.Jacobian(self.l_u, x)\n",
    "        self.l_uu = sym.Jacobian(self.l_u, u)\n",
    "        \n",
    "        l_final = cost_final(x)\n",
    "        self.l_final_x = sym.Jacobian([l_final], x).ravel()\n",
    "        self.l_final_xx = sym.Jacobian(self.l_final_x, x)\n",
    "        \n",
    "        f = discrete_dynamics(x, u)\n",
    "        self.f_x = sym.Jacobian(f, x)\n",
    "        self.f_u = sym.Jacobian(f, u)\n",
    "    \n",
    "    def stage(self, x, u):\n",
    "        env = {self.x_sym[i]: x[i] for i in range(x.shape[0])}\n",
    "        env.update({self.u_sym[i]: u[i] for i in range(u.shape[0])})\n",
    "        \n",
    "        l_x = sym.Evaluate(self.l_x, env).ravel()\n",
    "        l_u = sym.Evaluate(self.l_u, env).ravel()\n",
    "        l_xx = sym.Evaluate(self.l_xx, env)\n",
    "        l_ux = sym.Evaluate(self.l_ux, env)\n",
    "        l_uu = sym.Evaluate(self.l_uu, env)\n",
    "        \n",
    "        f_x = sym.Evaluate(self.f_x, env)\n",
    "        f_u = sym.Evaluate(self.f_u, env)\n",
    "\n",
    "        return l_x, l_u, l_xx, l_ux, l_uu, f_x, f_u\n",
    "    \n",
    "    def final(self, x):\n",
    "        env = {self.x_sym[i]: x[i] for i in range(x.shape[0])}\n",
    "        \n",
    "        l_final_x = sym.Evaluate(self.l_final_x, env).ravel()\n",
    "        l_final_xx = sym.Evaluate(self.l_final_xx, env)\n",
    "        \n",
    "        return l_final_x, l_final_xx\n",
    "        \n",
    "derivs = derivatives(discrete_dynamics, cost_stage, cost_final, n_x, n_u)\n",
    "# Test the output:\n",
    "# x = np.array([0, 0, 0, 0, 0])\n",
    "# u = np.array([0, 0])\n",
    "# print(derivs.stage(x, u))\n",
    "# print(derivs.final(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Expanding the second term of the Q-function of the Bellman equation, i.e. the value function at the next state $\\mathbf{x}[n+1]$, to second order yields \\begin{align*} \n",
    "V(\\mathbf{x}[n+1]) \\approx V_{n+1} + \n",
    "V_{\\mathbf{x},n+1}^T  \\delta \\mathbf{x}[n+1] + \\frac{1}{2}\\delta \\mathbf{x}[n+1]^T \n",
    "V_{\\mathbf{xx},n+1} \\delta \\mathbf{x}[n+1],\n",
    "\\end{align*}\n",
    "where $\\delta \\mathbf{x}[n+1]$ is given by\n",
    "\\begin{align*} \n",
    "\\delta \\mathbf{x}[n+1] \n",
    "& = \\mathbf{x}[n+1] - \\bar{\\mathbf{x}}[n+1] \\\\\n",
    "& = \\mathbf{f}_n + \\begin{bmatrix}\\mathbf{f}_{\\mathbf{x},n} &  \\mathbf{f}_{\\mathbf{u},n} \\end{bmatrix}  \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix} -  \\bar{\\mathbf{x}}[n+1] \\\\\n",
    "& = \\mathbf{f}_n + \\begin{bmatrix}\\mathbf{f}_{\\mathbf{x},n} & \\mathbf{f}_{\\mathbf{u},n} \\end{bmatrix}  \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix} -  \\mathbf{f}(\\bar{\\mathbf{x}}[n], \\bar{\\mathbf{u}}[n]) \\\\\n",
    "& = \\begin{bmatrix}\\mathbf{f}_{\\mathbf{x},n} &  \\mathbf{f}_{\\mathbf{u},n} \\end{bmatrix}   \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix}.\n",
    "\\end{align*}\n",
    "\n",
    "We have now expanded all terms of the Bellman equation and can regroup them in the form of\n",
    "\\begin{align*} \n",
    "Q(\\mathbf{x}[n], \\mathbf{u}[n]) & \n",
    "\\approx \\ell_n \n",
    "+ \\begin{bmatrix}\\ell_{\\mathbf{x},n} \\\\  \\ell_{\\mathbf{u},n} \\end{bmatrix} ^T  \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix}\n",
    "+ \\frac{1}{2}\\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix} ^T \n",
    "\\begin{bmatrix}\\ell_{\\mathbf{xx},n} &  \\ell_{\\mathbf{ux},n}^T\\\\  \n",
    "\\ell_{\\mathbf{ux},n} & \\ell_{\\mathbf{uu},n}\\end{bmatrix}\n",
    " \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix}, \\\\\n",
    " & \\quad + V_{n+1} + \n",
    "V_{\\mathbf{x},n+1}^T  \\delta \\mathbf{x}[n+1] + \\frac{1}{2}\\delta \\mathbf{x}[n+1]^T \n",
    "V_{\\mathbf{xx},n+1} \\delta \\mathbf{x}[n+1], \\\\\n",
    "& = Q_n \n",
    "+ \\begin{bmatrix} Q_{\\mathbf{x},n} \\\\  Q_{\\mathbf{u},n} \\end{bmatrix} ^T  \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix}\n",
    "+ \\frac{1}{2}\\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix} ^T \n",
    "\\begin{bmatrix} Q_{\\mathbf{xx},n} &  Q_{\\mathbf{ux},n}^T\\\\  \n",
    "Q_{\\mathbf{ux},n} & Q_{\\mathbf{uu},n}\\end{bmatrix}\n",
    " \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix}.\n",
    "\\end{align*}\n",
    "Find $Q_{\\mathbf{x},n}$, $Q_{\\mathbf{u},n}$, $Q_{\\mathbf{xx},n}$, $Q_{\\mathbf{ux},n}$, $Q_{\\mathbf{uu},n}$ in terms of $\\ell$ and $\\textbf{f}$ and their expansions by collecitng coefficients in $(\\cdot)\\delta \\mathbf{x}[n]$, $(\\cdot)\\delta \\mathbf{u}[n]$, $1/2 \\delta \\mathbf{x}[n]^T (\\cdot) \\delta \\mathbf{x}[n]$, and similar. Write your results in the corresponding function below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def Q_terms(l_x, l_u, l_xx, l_ux, l_uu, f_x, f_u, V_x, V_xx):\n",
    "    # TODO: Define the Q-terms here\n",
    "    Q_x = np.zeros(l_x.shape)\n",
    "    Q_u = np.zeros(l_u.shape)\n",
    "    Q_xx = np.zeros(l_xx.shape)\n",
    "    Q_ux = np.zeros(l_ux.shape)\n",
    "    Q_uu = np.zeros(l_uu.shape)\n",
    "    return Q_x, Q_u, Q_xx, Q_ux, Q_uu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q-function Optimization and Optimal Linear Control Law\n",
    "Amazing! Now that we have the Q-function in quadratic form, we can optimize for the optimal control gains in closed form.\n",
    "The original formulation, i.e. optimizing over $\\mathbf{u}[n]$,\n",
    "\\begin{align*} \\min_{\\mathbf{u}[n]} \\quad & Q(\\mathbf{x}[n], \\mathbf{u}[n]),\n",
    "\\end{align*} is equivalent to optimzing over $\\delta \\mathbf{u}[n]$.\n",
    "\n",
    "\\begin{align*} \n",
    "\\delta \\mathbf{u}[n]^* = {\\arg\\!\\min}_{\\delta \\mathbf{u}[n]} \\quad Q_n \n",
    "+ \\begin{bmatrix} Q_{\\mathbf{x},n} \\\\  Q_{\\mathbf{u},n} \\end{bmatrix} ^T  \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix} + \\frac{1}{2}\\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix} ^T \n",
    "\\begin{bmatrix} Q_{\\mathbf{xx},n} &  Q_{\\mathbf{ux},n}^T\\\\  \n",
    "Q_{\\mathbf{ux},n} & Q_{\\mathbf{uu},n}\\end{bmatrix}\n",
    " \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n] \\end{bmatrix}\n",
    " = k + K \\delta \\mathbf{x}[n]\n",
    "\\end{align*} It turns out that the optimal control is linear in $\\delta \\mathbf{x}[n]$.\n",
    "Solve the quadratic optimization analytically and derive equations for the feedforward gains $k$ and feedback gains $K$. Implement the function below. Hint: You do not need to compute $Q_\\mathbf{uu}^{-1}$ by hand."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gains(Q_uu, Q_u, Q_ux):\n",
    "    Q_uu_inv = np.linalg.inv(Q_uu)\n",
    "    # TOD: Implement the feedforward gain k and feedback gain K.\n",
    "    k = np.zeros(Q_u.shape)\n",
    "    K = np.zeros(Q_ux.shape)\n",
    "    return k, K"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Value Function Backward Update\n",
    "We are almost done! We need to derive the backwards update equation for the value function. We simply plugin the optimal control $\\delta \\mathbf{u}[n]^* = k + K \\delta \\mathbf{x}[n]$ back into the Q-function which yields the value function\n",
    "\\begin{align*} \n",
    "V(\\mathbf{x}[n]) \\approx V_{n} + \n",
    "V_{\\mathbf{x},n}^T  \\delta \\mathbf{x}[n] + \\frac{1}{2}\\delta \\mathbf{x}[n]^T \n",
    "V_{\\mathbf{xx},n} \\delta \\mathbf{x}[n] = Q_n \n",
    "+ \\begin{bmatrix} Q_{\\mathbf{x},n} \\\\  Q_{\\mathbf{u},n} \\end{bmatrix}^T  \n",
    "\\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n]^* \\end{bmatrix} \n",
    "+ \\frac{1}{2}\\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n]^* \\end{bmatrix}^T \n",
    "\\begin{bmatrix} Q_{\\mathbf{xx},n} &  Q_{\\mathbf{ux},n}^T\\\\  \n",
    "Q_{\\mathbf{ux},n} & Q_{\\mathbf{uu},n}\\end{bmatrix}\n",
    " \\begin{bmatrix} \\delta \\mathbf{x}[n] \\\\ \\delta \\mathbf{u}[n]^* \\end{bmatrix}.\n",
    "\\end{align*}\n",
    "Compare terms in $(\\cdot) \\delta \\mathbf{x}[n]$ and $ 1/2 \\delta \\mathbf{x}[n]^T (\\cdot)  \\delta \\mathbf{x}[n]$, find $V_{\\mathbf{x},n}$, and $V_{\\mathbf{xx},n}$ and implement the corresponding function below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def V_terms(Q_x, Q_u, Q_xx, Q_ux, Q_uu, K, k):\n",
    "    # TODO: Implement V_x and V_xx, hint: use the A.dot(B) function for matrix multiplcation.\n",
    "    V_x = np.zeros(Q_x.shape)\n",
    "    V_xx = np.zeros(Q_xx.shape)\n",
    "    return V_x, V_xx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Expected Cost Reduction\n",
    "We can also estimate by how much we expect to reduce the cost by applying the optimal controls. Simply subtract the previous nominal Q-value ($\\delta \\mathbf{x}[n] = 0$ and $\\delta \\mathbf{u}[n]=0$) from the value function.  The result is implemented below and is a useful aid in checking how accurate the quadratic approximation is during convergence of iLQR and adapting stepsize and regularization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def expected_cost_reduction(Q_u, Q_uu, k):\n",
    "    return -Q_u.T.dot(k) - 0.5 * k.T.dot(Q_uu.dot(k))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Forward Pass\n",
    "We have now have all the ingredients to implement the forward pass and the backward pass of iLQR. In the forward pass, at each timestep the new updated control $\\mathbf{u}' =  \\bar{\\mathbf{u}} + k + K (x' - \\bar{\\mathbf{x}})$ is applied and the dynamis propagated based on the updated control. The nominal control and state trajectory $\\bar{\\mathbf{u}}, \\bar{\\mathbf{x}}$ with which we computed $k$ and $K$ are then updated and we receive a new set of state and control trajectories."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def forward_pass(x_trj, u_trj, k_trj, K_trj):\n",
    "    x_trj_new = np.zeros(x_trj.shape)\n",
    "    x_trj_new[0,:] = x_trj[0,:]\n",
    "    u_trj_new = np.zeros(u_trj.shape)\n",
    "    # TODO: Implement the forward pass here\n",
    "#     for n in range(u_trj.shape[0]):\n",
    "#         u_trj_new[n,:] = # Apply feedback law\n",
    "#         x_trj_new[n+1,:] = # Apply dynamics\n",
    "    return x_trj_new, u_trj_new"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Backward Pass\n",
    "The backward pass starts from the terminal boundary condition $V(\\mathbf{x}[N]) =   \\ell_f(\\mathbf{x}[N])$, such that $V_{\\mathbf{x},N} = \\ell_{\\mathbf{x},f}$ and $V_{\\mathbf{xx},N} = \\ell_{\\mathbf{xx},f}$. In the backwards loop terms for the Q-function at $n$ are computed based on the quadratic value function approximation at $n+1$ and the derivatives and hessians of dynamics and cost functions at $n$. To solve for the gains $k$ and $K$ an inversion of the matrix $Q_\\mathbf{uu}$ is necessary. To ensure invertability and to improve conditioning we add a diagonal matrix to $Q_\\mathbf{uu}$. This is equivalent to adding a quadratic penalty on the distance of the new control trajectory from the control trajectory of the previous iteration. The result is a smaller stepsize and more conservative convergence properties."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def backward_pass(x_trj, u_trj, regu):\n",
    "    k_trj = np.zeros([u_trj.shape[0], u_trj.shape[1]])\n",
    "    K_trj = np.zeros([u_trj.shape[0], u_trj.shape[1], x_trj.shape[1]])\n",
    "    expected_cost_redu = 0\n",
    "    # TODO: Set terminal boundary condition here (V_x, V_xx)\n",
    "    V_x = np.zeros((x_trj.shape[1],))\n",
    "    V_xx = np.zeros((x_trj.shape[1],x_trj.shape[1]))\n",
    "    for n in range(u_trj.shape[0]-1, -1, -1):\n",
    "        # TODO: First compute derivatives, then the Q-terms \n",
    "        # l_x, l_u, l_xx, l_ux, l_uu, f_x, f_u = \n",
    "        # Q_x, Q_u, Q_xx, Q_ux, Q_uu =\n",
    "        Q_x = np.zeros((x_trj.shape[1],))\n",
    "        Q_u = np.zeros((u_trj.shape[1],))\n",
    "        Q_xx = np.zeros((x_trj.shape[1], x_trj.shape[1]))\n",
    "        Q_ux = np.zeros((u_trj.shape[1], x_trj.shape[1]))\n",
    "        Q_uu = np.zeros((u_trj.shape[1], u_trj.shape[1]))\n",
    "        # We add regularization to ensure that Q_uu is invertible and nicely conditioned\n",
    "        Q_uu_regu = Q_uu + np.eye(Q_uu.shape[0])*regu\n",
    "        k, K = gains(Q_uu_regu, Q_u, Q_ux)\n",
    "        k_trj[n,:] = k\n",
    "        K_trj[n,:,:] = K\n",
    "        V_x, V_xx = V_terms(Q_x, Q_u, Q_xx, Q_ux, Q_uu, K, k)\n",
    "        expected_cost_redu += expected_cost_reduction(Q_u, Q_uu, k)\n",
    "    return k_trj, K_trj, expected_cost_redu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Main Loop\n",
    "\n",
    "The main iLQR loop consists of iteratively applying the forward and backward pass. The regularization is adapted based on whether the new control and state trajectories improved the cost. We lower the regularization if the total cost was reduced and accept the new trajectory pair. If the total cost did not decrease, the trajectory pair is rejected and the regularization is increased. You may want to test the algorithm with deactivated regularization and observe the changed behavior.\n",
    "The main loop stops if the maximum number of iterations is reached or the expected reduction is below a certain threshold.\n",
    "\n",
    "If you have correctly implemented all subparts of the iLQR you should see that the car plans to drive around the circle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def run_ilqr(x0, N, max_iter=50, regu_init=100):\n",
    "    # First forward rollout\n",
    "    u_trj = np.random.randn(N-1, n_u)*0.0001\n",
    "    x_trj = rollout(x0, u_trj)\n",
    "    total_cost = cost_trj(x_trj, u_trj)\n",
    "    regu = regu_init\n",
    "    max_regu = 10000\n",
    "    min_regu = 0.01\n",
    "    \n",
    "    # Setup traces\n",
    "    cost_trace = [total_cost]\n",
    "    expected_cost_redu_trace = []\n",
    "    redu_ratio_trace = [1]\n",
    "    redu_trace = []\n",
    "    regu_trace = [regu]\n",
    "    \n",
    "    # Run main loop\n",
    "    for it in range(max_iter):\n",
    "        # Backward and forward pass\n",
    "        k_trj, K_trj, expected_cost_redu = backward_pass(x_trj, u_trj, regu)\n",
    "        x_trj_new, u_trj_new = forward_pass(x_trj, u_trj, k_trj, K_trj)\n",
    "        # Evaluate new trajectory\n",
    "        total_cost = cost_trj(x_trj_new, u_trj_new)\n",
    "        cost_redu = cost_trace[-1] - total_cost\n",
    "        redu_ratio = cost_redu / abs(expected_cost_redu)\n",
    "        # Accept or reject iteration\n",
    "        if cost_redu > 0:\n",
    "            # Improvement! Accept new trajectories and lower regularization\n",
    "            redu_ratio_trace.append(redu_ratio)\n",
    "            cost_trace.append(total_cost)\n",
    "            x_trj = x_trj_new\n",
    "            u_trj = u_trj_new\n",
    "            regu *= 0.7\n",
    "        else:\n",
    "            # Reject new trajectories and increase regularization\n",
    "            regu *= 2.0\n",
    "            cost_trace.append(cost_trace[-1])\n",
    "            redu_ratio_trace.append(0)\n",
    "        regu = min(max(regu, min_regu), max_regu)\n",
    "        regu_trace.append(regu)\n",
    "        redu_trace.append(cost_redu)\n",
    "\n",
    "        # Early termination if expected improvement is small\n",
    "        if expected_cost_redu <= 1e-6:\n",
    "            break\n",
    "            \n",
    "    return x_trj, u_trj, cost_trace, regu_trace, redu_ratio_trace, redu_trace\n",
    "\n",
    "# Setup problem and call iLQR\n",
    "x0 = np.array([-3.0, 1.0, -0.2, 0.0, 0.0])\n",
    "N = 50\n",
    "max_iter=50\n",
    "regu_init=100\n",
    "x_trj, u_trj, cost_trace, regu_trace, redu_ratio_trace, redu_trace = run_ilqr(x0, N, max_iter, regu_init)\n",
    "\n",
    "\n",
    "plt.figure(figsize=(9.5,8))\n",
    "# Plot circle\n",
    "theta = np.linspace(0, 2*np.pi, 100)\n",
    "plt.plot(r*np.cos(theta), r*np.sin(theta), linewidth=5)\n",
    "ax = plt.gca()\n",
    "\n",
    "# Plot resulting trajecotry of car\n",
    "plt.plot(x_trj[:,0], x_trj[:,1], linewidth=5)\n",
    "w = 2.0\n",
    "h = 1.0\n",
    "\n",
    "# Plot rectangles\n",
    "for n in range(x_trj.shape[0]):\n",
    "    rect = mpl.patches.Rectangle((-w/2,-h/2), w, h, fill=False)\n",
    "    t = mpl.transforms.Affine2D().rotate_deg_around(0, 0, \n",
    "            np.rad2deg(x_trj[n,2])).translate(x_trj[n,0], x_trj[n,1]) + ax.transData\n",
    "    rect.set_transform(t)\n",
    "    ax.add_patch(rect)\n",
    "ax.set_aspect(1)\n",
    "plt.ylim((-3,3))\n",
    "plt.xlim((-4.5,3))\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplots(figsize=(10,6))\n",
    "# Plot results\n",
    "plt.subplot(2, 2, 1)\n",
    "plt.plot(cost_trace)\n",
    "plt.xlabel('# Iteration')\n",
    "plt.ylabel('Total cost')\n",
    "plt.title('Cost trace')\n",
    "\n",
    "plt.subplot(2, 2, 2)\n",
    "delta_opt = (np.array(cost_trace) - cost_trace[-1])\n",
    "plt.plot(delta_opt)\n",
    "plt.yscale('log')\n",
    "plt.xlabel('# Iteration')\n",
    "plt.ylabel('Optimality gap')\n",
    "plt.title('Convergence plot')\n",
    "\n",
    "plt.subplot(2, 2, 3)\n",
    "plt.plot(redu_ratio_trace)\n",
    "plt.title('Ratio of actual reduction and expected reduction')\n",
    "plt.ylabel('Reduction ratio')\n",
    "plt.xlabel('# Iteration')\n",
    "\n",
    "plt.subplot(2, 2, 4)\n",
    "plt.plot(regu_trace)\n",
    "plt.title('Regularization trace')\n",
    "plt.ylabel('Regularization')\n",
    "plt.xlabel('# Iteration')\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Convergence Analysis\n",
    "You can find some plots of the convergence traces captured throughout the iLQR solve process above. The convergence plot indicates that we have achieved superlinear convergence. In fact, iLQR achieves nearly second order convergence. In the case of linear convergence (e.g. gradient descent), the [graph would show a line](https://en.wikipedia.org/wiki/Rate_of_convergence). While the integrated regularization improves robustness it damps convergence in the early iteration steps. \n",
    "\n",
    "In the ideal case, the expected reduction and the actual reduction should be the same, i.e. the reduction ratio remains around 1. If that is the case, the quadratic approximation of costs and linear approximation of the dynamics are very accurate. If the ratio becomes significantly lower than 1, the regularization needs to be increased and thus the stepsize reduced."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Autograding\n",
    "You can check your work by running the following cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from underactuated.exercises.trajopt.ilqr_driving.test_ilqr_driving import TestIlqrDriving\n",
    "from underactuated.exercises.grader import Grader\n",
    "Grader.grade_output([TestIlqrDriving], [locals()], 'results.json')\n",
    "Grader.print_test_results('results.json')"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}